Ant Group, in collaboration with Peking University, has released a benchmark for evaluating large language models in the DevOps field. This benchmark includes a total of 4850 multiple-choice questions across 8 categories such as planning, coding, building, testing, and releasing. The benchmark also provides detailed evaluations for AIOps tasks, showing that the score differences among various models are minimal.